11 research outputs found
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
Dimensionality Reduction and Visualisation Tools for Voting Records
Abstract. Recorded votes in legislative bodies are an important source of data for political scientists. Voting records can be used to describe parliamentary processes, identify ideological divides between members and reveal the strength of party cohesion. We explore the problem of working with vote data using popular dimensionality reduction techniques and cluster validation methods, as an alternative to more traditional scaling techniques. We present results of dimensionality reduction techniques applied to votes from the 6th and 7th European Parliaments, covering activity from 2004 to 2014
First international workshop on recent trends in news information retrieval (NewsIRâ16)
The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more and more on social networks and citizen journalism as a frontline to breaking news. In this new era of fast-flowing instant news delivery and consumption, publishers and aggregators have to overcome a great number of challenges. These include the verification or assessment of a sourceâs reliability; the integration of news with other sources of information; real-time processing of both news content and social streams in multiple languages, in different formats and in high volumes; deduplication; entity detection and disambiguation; automatic summarization; and news recommendation. Although Information Retrieval (IR) applied to news has been a popular research area for decades, fresh approaches are needed due to the changing type and volume of media content available and the way people consume this content. The goal of this workshop is to stimulate discussion around new and powerful uses of IR applied to news sources and the intersection of multiple IR tasks to solve real user problems. To promote research efforts in this area, we released a new dataset consisting of one million news articles to the research community and introduced a data challenge track as part of the workshop
From Detection to Discourse: Tracking Events and Communities in Breaking News
Online social networks are now an established part of our reality. People no longer rely solely on traditional media outlets to stay informed. Collectively, acts of citizen journalism have transformed news consumers into producers. Keeping up with the overwhelming volume of user-generated content from social media sources is challenging for even well-resourced news organisations. Filtering the most relevant content, however, is not trivial. Significant demand exists for editorial support systems that enable journalists to work more effectively. Social newsgathering introduces many new challenges to the tasks of detecting and tracking breaking news stories. In detection, substantial volumes of data introduce scalability challenges. When tracking developing stories, approaches developed on static collections of documents often fail to capture important changes in the content or structure of data over time. Furthermore, systems tuned on static collections can perform poorly on new, unseen data. To understand significant events, we must also consider the people and organisations who are generating content related to these events. Newsworthy sources are rarely objective and neutral, and in some cases, purposefully created for disinformation, giving rise to the "fake news" phenomenon. An individual's political ideology will inform and influence their choice of language, especially during significant political events such as elections, protests, and other polarising incidents. This thesis presents techniques developed with the intention of supporting journalists who monitor social media for breaking news. Starting with the curation of newsworthy sources, through to implementing an alert system for breaking news events, tracking the evolution of these stories over time, and finally exploring the language used by different communities to gain insights into the discourse around an event. As well as detecting and tracking significant events, it is of interest to identify the differences in language patterns between groups of people around those events. Distributional semantic language models offer a way to quantify certain aspects of discourse, allowing us to track how different communities use language, thereby revealing their stances on key issues
Detecting Attention Dominating Moments Across Media Types - Tweet Stream
Tweets spanning September 2015. Parallel corpus to Signal Media 1 Million Articles set for NewsIR'16
Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering
Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.Science Foundation Irelan
Detecting Attention Dominating Moments Across Media Types
NewsIRâ16 Workshop at ECIR, Padua, Italy, 20-March 2016In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is effective for capturing this effect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly different media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.Science Foundation Irelan
A system for twitter user list curation
The ACM Conference on Recommender Systems (RecSys-2012), Dublin, Ireland, 9-13 September 2012With increased adoption of social networking tools, it is becoming more difficult to extract useful information from the mass of data generated daily by users. Curation of content and sources is an important filter in separating the signal from noise. A good set of credible sources often requires painstaking manual curation, which often yields incomplete coverage of a topic. In this demo, we present a recommender system to aid this process, improving the quality and quantity of sources. The system is highly-adaptable to the goals of the curator, enabling some novel uses for curating and monitoring lists of users.Science Foundation Irelan
Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering
Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.Science Foundation Irelan
Real time event monitoring with trident
RealStream: Real-World Challenges for Data Stream Mining workshop at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2013), Prague, September 23th to 27th, 2013Building a scalable, fault-tolerant stream mining system that
deals with realistic data volumes presents unique challenges. Considerable
work is being done to make the development of such systems simpler,
creating high level abstractions on top of existing systems. Many of the
technical barriers can be eliminated by adopting a state-of-the-art interface,
such as the Trident API for Storm. We describe a stream mining
tool, based on Trident, for monitoring breaking news events on Twitter,
which can be extended quickly and scaled easily.Science Foundation IrelandAuthor has checked copyrightAD 22/01/201